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Abstract Genome sequence analysis of Middle East respiratory syndrome coronavirus 
(MERS-CoV) variants from patient specimens has revealed the evolutionary dynamics and 
mechanisms of pathogenesis of the virus. However, most studies have analyzed the 
consensus sequences of MERS-CoVs, precluding an investigation of intrapatient 
heterogeneity. Here, we analyzed non-consensus sequences to characterize intrapatient 
heterogeneity in cases associated with the 2015 outbreak of MERS in South Korea. Deep- 
sequencing analysis of MERS-CoV genomes performed on specimens from eight patients 
revealed significant intrapatient variation; therefore, sequence heterogeneity was further 
analyzed using targeted deep sequencing. A total of 35 specimens from 24 patients 
(including a super-spreader) were sequenced to detect and analyze variants displaying 
intrapatient heterogeneity. Based on the analysis of non-consensus sequences, we 
demonstrated the intrapatient heterogeneity of MERS-CoVs, with the highest level in the 
super-spreader specimen. The heterogeneity could be transmitted in a close association 
with variation in the consensus sequences, suggesting the occurrence of multiple MERS- 
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CoV infections. Analysis of intrapatient heterogeneity revealed a relationship between 
D510G and 1529T mutations in the receptor-binding domain (RBD) of the viral spike 
glycoprotein. These two mutations have been reported to reduce the affinity of the RBD 
for human CD26. Notably, although the frequency of both D510G and 1529T varied 
greatly among specimens, the combined frequency of the single mutants was consistently 
high (87.7% + 1.9% on average). Concurrently, the frequency of occurrence of the wild 
type at the two positions was only 6.5% + 1.7% on average, supporting the hypothesis 
that selection pressure exerted by the host immune response played a critical role in 
shaping genetic variants and their interaction in human MERS-CoVs during the outbreak. 
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INTRODUCTION 


Middle East respiratory syndrome coronavirus (MERS-CoV) was first isolated from a patient in 
Saudi Arabia in 2012 and has been shown to cause severe acute respiratory illness, including 
fever, cough, and shortness of breath (Zaki et al. 2012). As of March 1, 2016, 1638 laboratory- 
confirmed cases (587 deaths; 36% case fatality rate [CFR]) have been reported to the World 
Health Organization. A South Korean outbreak of MERS began in May 2015, and its transmis- 
sion continued until early July, resulting in 186 laboratory-confirmed cases with 38 deaths 
(20.4% CFR). In contrast to previous studies, which have suggested limited person-to-person 
transmissibility of MERS-CoV (Breban et al. 2013; Cotten et al. 2013b), many secondary and 
tertiary cases of transmission occurred during the South Korean outbreak. Importantly, more 
than half of the tertiary cases were transmitted from one particular super-spreader, called 
Patient 14 in this study. This unusual transmission pattern raised questions related to trans- 
missibility as well as the potential adaptations of MERS-CoV to the human host. To address 
these questions, several researchers have investigated MERS-CoV sequences. However, all 
previous studies on the South Korean outbreak have focused only on the consensus se- 
quences of MERS-CoVs (Wang et al. 2015; Kim et al. 2016a,b; Park et al. 2016; Seong 
et al. 2016). 

The evolutionary dynamics of RNA viruses are complex because of their high mutation 
rates, rapid replication rates, and large population sizes. Multiple rounds of replication of 
a given viral genome can generate a cloud of diverse variants or a heterogeneous viral pop- 
ulation. Previous reports on other RNA viruses have suggested that quasispecies diversity, 
rather than the selection of individual variants, correlates with pathogenicity and enables ad- 
aptation to new environments (Vignuzzi et al. 2006; Lauring and Andino 2010). The impor- 
tance of characterizing viral populations as swarms with similar variants has led several 
groups to use next-generation sequencing (NGS) technologies on clinical samples to ex- 
plore the complexity of the spectrum of mutant viruses, including HIV-1 and hepatitis B virus 
(Bushman et al. 2008; Eriksson et al. 2008; Margeridon-Thermet et al. 2009). Although the 
description of MERS-CoV quasispecies is far from completion, previous studies that recov- 
ered whole MERS-CoV genome sequences from dromedary camels revealed the existence 
of intrahost single-nucleotide variants (Briese et al. 2014; Borucki et al. 2016). 

To date, no intrahost MERS-CoV single-nucleotide variants have been identified in hu- 
mans, except by Cotton et al. (2013a), where data from one specimen showed the presence 
of non-consensus variants (<5%). Currently, it is hard to discern whether only consensus se- 
quences have been reported or whether human MERS-CoV sequences represent almost 
clonal virus populations within individual cases. By analyzing non—consensus sequences in 
35 specimens from 24 patients infected during the South Korean outbreak (2015), we dem- 
onstrated the existence of intrapatient heterogeneity and investigated its functional 
implications. 


RESULTS 


Patients and Sample Collection 

Among the 24 cases, 14 were patients, four were caregivers, and six were health-care work- 
ers (HCWs) (Table 1). Patient 14 was a second-generation case who had been exposed to the 
index patient. Exposure to Patient 14 led to a second wave of the outbreak, resulting in a 
total of 81 third-generation infections (Oh et al. 2015). Soecimens from 20 of these third-gen- 
eration cases were analyzed in this study (Supplemental Fig. $1). Patients 162, 164, and 169 
were HCWs and fourth-generation case patients who had been exposed to Patient 135. 
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Table 1. Characteristics of patients included in this study 


Transmission 


Patient Age/ Case Disease Source 
no. sex type severity Underlying disease Generation patient 
14 35/M Patient Severe None 2nd 1 
48 37/M Caregiver Moderate None 3rd 14 
50 80/F Patient Severe Posterior cerebral artery 3rd 

infarction 
61 55/M Caregiver Severe None 3rd 14 
62 30/M HCW Mild None 3rd 14 
66 42/F Patient Moderate Myelodysplastic syndrome 3rd 14 
68 54/F Caregiver oderate None 3rd 14 
15 62/M Patient oderate Rectal cancer 3rd 14 
77 63/M Patient Severe Necrotizing pancreatitis 3rd 14 
78 41/F HCW oderate None 3rd 14 
80 34/M Patient Severe alignant lymphoma 3rd 14 
99 47/M Caregiver oderate None 3rd 14 
100 32/F Patient oderate Neuroendocrine tumor 3rd 14 
101 84/M Patient Severe Renal cell carcinoma 3rd 14 
102 48/F Patient Moderate None 3rd 14 
103 65/M Patient Moderate Middle cerebral artery 3rd 14 

infarction 
134 67/F Patient Mild None 3rd 
135 32/M HCW Severe None 3rd 14 
155 42/F Patient Mild ypertrophic 3rd 

cardiomyopathy 
157 59/M Patient Severe Lung cancer 3rd 14 
162 32/M HCW Severe None 4th 135 
164 35/F HCW Moderate None 4th 135 
169 33/M HCW Moderate None 4th 135 
177 49/F Patient Severe alignant lymphoma 3rd 14 


HCW, health-care worker. 


To determine whether viral genomic variability was related to disease severity, patient 
cases were categorized as mild (symptomatic cases without pneumonia; n= 3), moderate 
(cases with pneumonia; n= 11), and severe (cases with respiratory failure or death; n= 10) 
(Table 1). In eight cases, multiple lower respiratory tract specimens were obtained 
(Supplemental Table $1). Specimens from patients 50, 77, and 135 were obtained before 
and after respiratory failure. 


Characterization of MERS-CoV Genome 


Specimens from eight patients who had been identified as MERS-CoV-positive were se- 
quenced. The samples were found positive by both upstream-of-the-envelope gene (upE) 
and open reading frame 1a (orf1a) real-time polymerase chain reaction (PCR) assays, with 
cycle threshold (Ct) values averaging 17.3 (15.4-22.6) and 18.3 (15.4-23.0), respec- 
tively (Supplemental Table S1). For each patient, a MERS-CoV consensus sequence that 
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indicated the major nucleotide at each genomic position was generated. The genome se- 
quences of the eight isolates differed from each other at only seven positions (Table 2). 
The nucleotide substitutions were all nonsynonymous and occurred in the orflab (n= 3) 
and S (n= 4) genes (Table 2). The sequences of the eight isolates had high nucleotide iden- 
tities (ranging from 99.96% to 100%, with 99.98% to 100% sequence identities in ORF 1a and 
1b [orflab] and 99.85% to 100% identities in the spike glycoprotein gene) with recently pub- 
lished sequences of MERS-CoV isolates from the outbreak in South Korea (Kim et al. 2015; Lu 
et al. 2015; Seong et al. 2016). The E, M, and N genes were 100% identical to the previously 
described MERS-CoV isolates from South Korea. The sequence from Patient 14 in this study 
differed from the sequence from a previous study at two positions (Seong et al. 2016). 
Because the frequencies of the major nucleotides at these positions were close to 50% in 
this study, the differences between the consensus sequences may have arisen from small 
technical variations in measurement. 

Using 105 previously published genome sequences of MERS-CoVs, including those from 
recent cases in South Korea, we analyzed the phylogenetic relationships of eight newly se- 
quenced isolates. The new sequences clustered together with the other MERS-CoV isolates 
from the 2015 South Korean outbreak (Supplemental Fig. $2; Kim et al. 2015; Lu et al. 2015; 
Seong et al. 2016). 


Presence of Intrapatient Heterogeneity in MERS-CoVs 

Analyzing the MERS-CoV genome sequences, we noticed that a significant number of nu- 
cleotide sites had mixed bases. Because deep sequencing not only generates a consen- 
sus sequence but also identifies non—consensus nucleotides at each position, Cotton et al. 
(2013a) analyzed nucleotide variants present at >1% frequency to determine the variants 
important for intrahost evolution. In the previous study, the nucleotide variants present at 
a frequency greater than the sequencing error rate (i.e., 1%) were considered to be true 
variants, although this estimate may be conservative for most applications. However, er- 
rors introduced during reverse transcription and PCR amplification can significantly distort 
estimates of allele frequencies, especially if a limited amount of RNA is used as an input. 
Thus, in this study, we evaluated the reproducibility of allele frequencies of non—consen- 
sus nucleotides. For this purpose, duplicate libraries generated by independent comple- 
mentary DNA (cDNA) synthesis using the same RNA samples were sequenced and 
compared. Allele frequencies of duplicate samples from Patients 80 and 162 were signifi- 
cantly correlated, implying that the majority of mixed bases were a reflection of intrapa- 
tient heterogeneity rather than a result of sequencing errors (Fig. 1A,B). 

Despite the relatively high reproducibility, we focused on nucleotide variants present at a 
relatively high frequency (i.e., >30%) to minimize false positives among non-consensus var- 
iants. Positions where a nucleotide variant was present at a frequency of >30% in any of the 
eight isolates were selected for further investigation using additional samples. Targeted 
deep sequencing was performed on an additional 27 samples targeting 3003-bp regions 
that included the variable sites. Eleven libraries from independent cDNA syntheses of five 
samples, including four libraries in duplicate and one in triplicate, were constructed to eval- 
uate the technical noise in measuring allele frequency. Only variants with frequencies signifi- 
cantly greater than those from technical noise were considered true variants. Technical noise 
is described in Methods and was based on a 5% significance level after performing 
Bonferroni adjustment. A total of 16 positions displaying intrapatient heterogeneity were 
identified through statistical testing. The results from seven specimens were validated using 
Sanger sequencing. We tested nine sites and confirmed concordant mixed bases at seven 
positions—namely, positions 6884, 7317, 11257, 21726, 22356, 22984, and 23041 (Fig. 
1C), but found discrepancies at positions 7322 and 19075. Although we were not able to 
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Figure 1. Reproducibility assessment for the detection of non-consensus variants. Correlation of allele fre- 
quencies between sample replicates from (A) Patient 80 and (B) Patient 162. (C) Representative validation 
by the Sanger sequencing method. 


validate all variants by Sanger sequencing owing to a limited sample volume, the results from 
the test set showed a 78% validation rate. In addition, the frequencies of the two variants at 
positions 22984 and 23041 (i.e., G at position 22984 and T at 23041; A at 22984 and C at 
23041) were highly correlated in the specimens (Supplemental Fig. S3), suggesting a tight 
linkage between these variants instead of random sequencing errors. 

Fourteen positions exhibiting mixed bases are presented in Figure 2. These include all 
seven nucleotide variants identified in the full-genome sequences of the eight MERS-CoV 
isolates, indicating that intrapatient MERS-CoV heterogeneity was closely associated with 
isolate-specific variants in the consensus sequences. Taken together, our data demonstrate 
the intrapatient heterogeneity of MERS-CoVs. 


Interactions Between Genetic Variants 
As described above, there was a tight linkage between positions 22984 and 23041—the 
variant at one position and the wild type at the other position or vice versa (i.e., D510G 
and 1529; D510 and |529T). Based on the genetic linkage, a functional relationship was 
expected between the variants at these two positions. In fact, these two mutations were 
located in one of the two major binding patches in the receptor-binding domain (RBD), 
where escape mutations from neutralizing antibodies were previously reported (Tang 
et al. 2014). 

Based on the tight correlation of base frequencies at positions 22984 and 23041, our 
data suggested that the double mutant carrying D510G and |529T is rare. Therefore, we 
selected sequencing reads covering both positions in each specimen to measure the 
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Figure 2. Summary of variable sites among 35 samples from 24 patients. At each position, a mixed base is 
displayed and colored if its frequency was >10%. For mixed bases, the font size reflects the frequency of 
that base. Mod, moderate; Sev, severe. 


frequencies of the double mutants (i.e., D510G, |529T), the single mutants (i.e., D510G, 1529 
and D510, 15297), and the wild type (i.e., D510, 1529). On average, the frequency of the dou- 
ble mutant was only 1.4% + 0.3%, whereas the single mutants and the wild type were present 
at 87.7% + 1.9% and 6.5% + 1.7%, respectively (Fig. 3). Recently, both D510G and I529T mu- 
tations in RBD were shown to reduce its affinity for human CD26 compared with the wild- 
type RBD (Kim et al. 2016c). Although the previous study did not test the binding affinity 
of the double mutant, in our data the two mutations are mutually exclusive, suggesting 
that the double mutant severely impairs viral fitness. Notably, although the frequency of 
each single mutant varied greatly among specimens, the combined frequency of both single 
mutants was consistently high in most samples. At the same time, the frequency of the wild 
type was no more than 10% in most samples (31 of 35 specimens). Thus, our data suggest 
that there was strong selection pressure favoring the variants of the spike glycoprotein 
with reduced affinity for the host receptor. 

Another notable nonsynonymous substitution was A107V (at position 11257), which oc- 
curred in the nonstructural protein 6 (nsp6) coding region within orflab. This A107V variation 
in nsp6 and the 1529T substitution in the spike glycoprotein were frequently found in the 
South Korean isolates and appeared to be genetically linked (Fig. 2; Supplemental Table 
$2). Nspé is a membrane-spanning protein and is an integral component of the viral replica- 
tion complex involved in double-membrane vesicle formation (Lundin et al. 2014). Although 
the functional importance of these MERS-CoV variants in viral replication and their interac- 
tion with the host immune system remain to be elucidated, our data showed that the selec- 
tion of these variants may not be independent from each other, suggesting the unit of 
selection may be a combination of variants rather than individual variants. 
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Figure 3. The fraction of mutant and wild-type alleles at positions 510 and 529 of the spike glycoprotein. To 
measure the frequencies of the mutant and the wild-type alleles, sequencing reads covering positions 510 and 
529 were selected. For each specimen, the read counts supporting each type were divided by the total read 
counts. The frequencies of the double mutants (D510G, 15297), the single mutants (D510G, 1529 and D510, 
15297), and the wild-type (D510, 1529) alleles are plotted for 35 specimens sorted by sample number. 


DISCUSSION 


In this study, we demonstrated the intrapatient heterogeneity of MERS-CoVs, ruling out the 
possibility of technical noise. Based on the statistical test described above, we first selected 
16 non-consensus variant candidates with frequencies significantly higher than those from 
technical noise. Among those variant candidates, we tested nine candidates by Sanger se- 
quencing and validated seven variants, resulting in a 78% validation rate. After removal of 
the two nonvalidated candidates, we ultimately listed a total of 14 non—-consensus variants, 
expecting an 89% validation rate. There was a tight correlation between the frequencies of 
the two variants at positions 22984 and 23041. These results suggest that technical artifacts 
were highly unlikely to have generated such non—consensus variants. Our analysis was limit- 
ed to relatively high-frequency variants because the sequencing error rate presents certain 
limitations for the study of RNA as opposed to DNA viruses. Technical advances such as the 
use of unique molecular identifiers to remove PCR errors may produce more accurate de- 
scriptions of MERS-CoV populations in patients (Kinde et al. 2011). Despite the limitations, 
we demonstrated the presence of heterogeneous MERS-CoV populations in clinical samples 
from MERS-CoV-infected patients. 

Given the large number of cases that resulted from exposure to Patient 14, it was intrigu- 
ing that Patient 14 displayed the highest intrapatient heterogeneity. Among the 10 single- 
nucleotide variants that were observed in more than one patient, six variants were present 
at frequencies >0% in Patient 14 (Fig. 2). Thus, the majority of intra- and interpatient hetero- 
geneity observed in different patients seemed to originate from the variants present in 
Patient 14. Our results showed that intrapatient heterogeneity can be transmitted in some 
cases, indicating that humans can be simultaneously infected with more than one MERS- 
CoV. 

Because most of the mixed bases observed in Patient 14 were not completely fixed in the 
subsequent generation of cases, the single-nucleotide variants might not have had signifi- 
cantly higher fitness than wild type. For example, the sequence from Patient 14 had a mixed 
base C/T at position 11257. Among sequences from patients exposed to Patient 14, a subset 
displayed either C or T alone with little intrapatient heterogeneity, whereas others showed 
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similar intrapatient heterogeneity to Patient 14. In addition, we could not find any significant 
differences in genetic variants based on disease severity group (Supplemental Tables $3 
and S4). Taken together, genetic variant composition in each patient varied, regardless of 
the transmissibility or disease severity, suggesting that transmission of individual genetic 
sequences was stochastic rather than selective. Even if the genetic variants had a selective 
advantage, the advantage of individual genetic sequences might have been weak and/or 
varied depending on the patient. 

Although we could not provide a clear picture of the functional interactions among dis- 
tinct genetic variants in the population, the high ratio of nonsynonymous to synonymous 
substitutions offered a clue to their functional impact. The 14 variants consisted of 12 non- 
synonymous and two synonymous substitutions. Assuming that the unit of selection might 
not be an individual variant but the population as a whole, we calculated dy (the number 
of nonsynonymous changes per nonsynonymous site) and dg (the number of synonymous 
changes per synonymous site) by the Pamilo—Bianchi-Li method (Li 1993; Pamilo and 
Bianchi 1993). The dn/ds ratio was 1.03, hinting at the possibility of positive selection. 
Notably, six of the 12 nonsynonymous substitutions were found in the spike glycoprotein- 
coding region. As quasispecies theory proposes, the unit of selection might not be an indi- 
vidual virus but the population as a whole. Our data showed that the frequencies of the sin- 
gle mutants (D510G, 1529 and D510, 15297) significantly fluctuated among specimens, but 
the combined frequency of the single mutants was consistently high in most samples, sug- 
gesting a combination of variants as the unit of selection. 

Recently, in an analysis of 13 MERS-CoV genomes associated with the 2015 outbreak in 
South Korea, Kim et al. (2016c) reported that 11 of those genomes had an |529T mutation in 
RBD, and one had a D510G mutation. The study showed that D510G and I529T mutations 
resulted in reduced affinity of RBD for the human CD26 receptor compared with the wild- 
type RBD. Based on the spread of a mutant MERS-CoV with reduced affinity for the receptor, 
the authors suggested that MERS-CoV adaptation during human-to-human spread might be 
driven by host immunological pressure such as neutralizing antibodies (Kim et al. 2016c). 
Consistent with the previous analysis, our data showed that the I529T and D510G mutations 
were observed in the consensus sequences from 29 and 4 of 35 samples, respectively. 
Furthermore, our analysis at an intrapatient level showed that the frequency of the wild 
type at both positions was only 6.5% + 1.7% on average, supporting a strong selection pres- 
sure favoring the variants over the wild type. 

Thus, we questioned whether the selection pressure was exerted by a host immune re- 
sponse such as neutralizing antibodies, as previously suggested (Kim et al. 2016c). In such 
cases, a reduction in host immune pressure might increase the frequency of the wild type. 
Four specimens displaying relatively high frequency (i.e., >10%) of the wild type belonged 
to Patients 77 and 80. We noticed dramatic changes in routine blood test results such as 
white blood cell (WBC) counts and C-reactive protein (CRP) levels in these patients. We 
analyzed 19 serial specimens from eight patients. For this, we used normalized WBC count 
values expressed as a percentage of the first of a series of samples from each patient. Serial 
samples from Patients 77 and 80 displayed a dramatic decrease in the normalized WBC 
count and a simultaneous increase in the frequency of the wild-type allele (Supplemental 
Fig. S4). Whereas WBC counts significantly decreased in those patients, the CRP level in- 
creased during the period of MERS-CoV infection, indicating an impaired immune response. 
Although these data are limited owing to the small sample size and the lack of direct mea- 
surement of host immunological pressure, our results suggest that the selection pressure ex- 
erted by the host immune response might favor variants with reduced affinity to the host 
receptor; therefore, a reduction in this selection pressure resulted in the expansion of viruses 
with the wild-type allele. The characterization and quantification of neutralizing antibodies in 
patients over time is required to determine their association with the mutants and to validate 
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the hypothesis. Moreover, more accurate descriptions of MERS-CoV populations within pa- 
tients will advance our understanding of the complex molecular evolution of MERS-CoVs. 
Because the evolution of a virus is a continuous process that takes place during intrapa- 
tient infection and interpatient transmission, the evolution of MERS-CoVs should be studied 
at both intra- and interpatient levels to obtain a better understanding of the principles under- 
lying the complex molecular evolution of MERS-CoVs in natural populations. However, our 
knowledge of MERS-CoV evolution primarily depends on analysis at the interpatient level, 
ignoring genetic diversity within individual patients. In this study, we demonstrated intrapa- 
tient heterogeneity in human MERS-CoV isolates. Based on the analyses of the genetic diver- 
sity of MERS-CoVs at both the intra- and interpatient levels, our results shed light on the 
evolutionary dynamics of MERS-CoVs associated with the South Korean outbreak. 


METHODS 


Collection of Clinical Specimens 

A total of 35 clinical specimens from 24 patients were determined positive for MERS-CoV 
and were included in this study. Clinical information and information about possible expo- 
sure to other MERS-CoV patients was collected from the electronic medical records and pub- 
licly available data from multiple sources including the South Korea Centers for Disease 
Control and Prevention and the South Korea Ministry of Health and Welfare. These data in- 
cluded age, gender, epidemiologic link, dates of suspected exposure to MERS patients, ini- 
tial symptoms, date of symptom onset, and clinical courses. 


MERS-CoV Real-Time Reverse Transcription PCR Assays 

RNA was extracted from clinical specimens using either a O[Aamp DSP Viral RNA Mini Kit 
(Catalog No. 61904, QIAGEN GmbH) or an automated MagNAPure 96 extraction instrument 
with a total nucleic acid isolation kit (Roche). The extraction was performed according to the 
manufacturers’ instructions and the extracted RNA was stored at —70°C. 

MERS-CoVs were detected in specimens using real-time reverse transcription (rRT)-PCR. 
Extracted RNAs were screened by targeting the upE region, and positive results were con- 
firmed by a subsequent amplification of orfla using a PowerChek MERS Real-Time PCR kit 
(Kogene Biotech, Seoul, South Korea). All rRT-PCR reactions were performed using the 7500 
Fast Real-Time PCR System (Applied Biosystems) with a total reaction volume of 20 pL (15 wl 
of PCR reaction mixture and 5 uL of template RNA). The thermocycling conditions included a 
reverse transcription reaction for 30 min at 50°C, followed by 10 min at 95°C, and then 40 
cycles of 15 sec at 95°C and 60 sec at 60°C. A positive viral template control and a nontem- 
plate control were included in each run. The glyceraldehyde-3-phosphate dehydrogenase 
(GAPDH) gene was amplified simultaneously as an internal control to monitor PCR inhibition. 
A positive result was identified by a well-defined exponential fluorescence curve that crossed 
the defined threshold at <35 cycles in both the upE and ORF 1a assays. 


Reverse Transcription and PCR Amplification for Sequencing 

Five microliters of viral RNA from each sample was used as a template for cDNA synthesis 
using a Superscript Ill First-Strand Synthesis System (Life Technologies). Equal amounts of 
cDNA product were used to perform PCR with a Herculase || Fusion DNA polymerase 
(Agilent Technologies). For full-genome sequencing, primers described by Cotten et al. 
(2013a) were used with minor modifications for efficient amplification. A set of 60 primers 
was used to generate fifteen 2.5-kb overlapping amplicons (four primers per amplicon) 
and three additional primers were added for the extreme termini of the genome 
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(Supplemental Table S5). Primers used for the targeted sequencing are listed in 
Supplemental Table Sé. Primers and nucleotides were removed from the final PCR products 
using a MiniElute PCR purification kit (QIAGEN). 


Sequencing 

The PCR products from each patient were combined into a pool of approximately equal mo- 
larity and subjected to paired-end library construction with a TruSeq Nano DNA Sample 
prep kit (Illumina). Libraries were sequenced using an Illumina MiSeq sequencer (Illumina), 
generating 150-bp paired-end reads. On average, 4113x and 9361x raw read depths 
were achieved for full-genome and targeted deep sequencing, respectively. Sequencing 
metrics including mean depth of coverage of each sample are summarized in Supplemental 
Table S7. Bases were called with MiSeq reporter software (ver. 2.4.60). The reads were 
quality filtered using the command “fastq_quality_filter,” which required a minimum of 
90% bases with a quality score of 20 or higher. Paired-end reads were aligned with the 
National Center for Biotechnology Information (NCBI) MERS-CoV reference sequence 
(NC_019843.3) using the Burrows—Wheeler Aligner (BWA) (Li and Durbin 2010). SAMtools 
v0.1.19 (Li et al. 2009), GATK v2.4-7 (McKenna et al. 2010), and Picard v1.93 were used 
for sorting SAM/BAM files, local realignment, and duplicate markings, respectively. We 
used the SAMtools “mpileup” command to determine the read depth. The consensus se- 
quence was determined from the most commonly expressed nucleotide at each position. 


Phylogenetic Analysis 

We downloaded a total of 105 human MERS-CoV genome sequences from the NCBI 
database (http://www.ncbi.nlm.nih.gov/genomes/VirusVariation/Database/nph-select.cgi? 
cmd=database&taxid=1335626). Combining the eight MERS-CoV genome sequences 
from our study with the downloaded sequences, a phylogenetic analysis was performed after 
aligning the sequences using MUSCLE (http://drive5.com/muscle). A total of 30,130 aligned 
nucleotides were input into MEGA 7.0 (http://www.megasoftware.net) (Tamura et al. 2013), 
and a phylogenetic tree was constructed using the neighbor-joining method. The GenBank 
accession number followed by a brief description of the sequence, including the virus variant 
name and the country of isolation, is provided for each sequence in Supplemental Figure S2. 
Evolutionary distances were computed using the Kimura two-parameter model. The scale 
bar at the bottom of the figure indicates nucleotide substitutions per site. 


Determination of Technical Noise and Significance Test for Variants 

We used X;,,to denote the allele frequency for the ith sample, the jth chromosomal position, 
the base (A, T, G, or C), and the rth replicated experiment. If the total depth was <100x ata 
given ijr, we treated it as missing, irrespective of the b value. We defined the background 
noise value (Nip) as the average of the absolute differences between replicated data for 
all combinations: 


kat Dotck Xjok — Xj Iy=1, fk <h, 
Lei eile , Ik = 0, otherwise. 


Nip = 


Each variant allele frequency was tested by Z with a mean of 


En= (Sr, Ne)/n 
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and a variance of 
Vib = ‘Cae (Nip — Ep)*)/(n — 1), 


when replicates were available for only n samples. 


ADDITIONAL INFORMATION 


Data Deposition and Access 


The seven full-length and one partial virus genomes were deposited at GenBank (http:// 
www.ncbi.nlm.nih.gov/genbank/) under accession numbers KX034093-KX034100. Raw 
sequencing data were deposited in the Sequence Read Archive (SRA; http://www.ncbi. 
nlm.nih.gov/sra) under accession number SRPO72910. 
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